TransDoop: A Map-Reduce based Crowdsourced Translation for Complex Domain
نویسندگان
چکیده
Large amount of parallel corpora is required for building Statistical Machine Translation (SMT) systems. We describe the TransDoop system for gathering translations to create parallel corpora from online crowd workforce who have familiarity with multiple languages but are not expert translators. Our system uses a Map-Reduce-like approach to translation crowdsourcing where sentence translation is decomposed into the following smaller tasks: (a) translation of constituent phrases of the sentence; (b) validation of quality of the phrase translations; and (c) composition of complete sentence translations from phrase translations. TransDoop incorporates quality control mechanisms and easy-to-use worker user interfaces designed to address issues with translation crowdsourcing. We have evaluated the crowd’s output using the METEOR metric. For a complex domain like judicial proceedings, the higher scores obtained by the map-reduce based approach compared to complete sentence translation establishes the efficacy of our work.
منابع مشابه
TransDoop: A Map-Reduce based Crowdsourced Translation for Complex Domains
Large amount of parallel corpora is required for building Statistical Machine Translation (SMT) systems. We describe the TransDoop system for gathering translations to create parallel corpora from online crowd workforce who have familiarity with multiple languages but are not expert translators. Our system uses a Map-Reduce-like approach to translation crowdsourcing where sentence translation i...
متن کاملIRT-based Aggregation Model of Crowdsourced Pairwise Comparison for Evaluating Machine Translations
Recent work on machine translation has used crowdsourcing to reduce costs of manual evaluations. However, crowdsourced judgments are often biased and inaccurate. In this paper, we present a statistical model that aggregates many manual pairwise comparisons to robustly measure a machine translation system’s performance. Our method applies graded response model from item response theory (IRT), wh...
متن کاملImage encryption based on chaotic tent map in time and frequency domains
The present paper is aimed at introducing a new algorithm for image encryption using chaotic tent maps and the desired key image. This algorithm consists of two parts, the first of which works in the frequency domain and the second, in the time domain. In the frequency domain, a desired key image is used, and a random number is generated, using the chaotic tent map, in order to change the phase...
متن کاملSelected Crowdsourced Translation Practices
This paper contains research related to workflow and design patterns. It briefly discusses the suitability of industry tools for crowdsourcing processes in terms of workflow pattern support. After listing a number of practices identified by analysing crowdsourced translation workflow models, the paper discusses four of the practices and presents two recommendations based on the scenarios of rea...
متن کاملOn Analytical Study of Self-Affine Maps
Self-affine maps were successfully used for edge detection, image segmentation, and contour extraction. They belong to the general category of patch-based methods. Particularly, each self-affine map is defined by one pair of patches in the image domain. By minimizing the difference between these patches, the optimal translation vector of the self-affine map is obtained. Almost all image process...
متن کامل